o 

(N 



oo 



Partially linear models on Riemannian manifolds 

Wenceslao Gonzalez-Manteiga 1 , Guillermo Henry 2 and Daniela Rodriguez 2 
1 Universidad dc Santiago de Compostela, Spain 
2 Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires and CONICET, Argentina 

Abstract 

In partially linear models the dependence of the response y on (x T ,i) is modeled through the relationship 
y = x T /3 + g(t) + e where e is independent of (x T , t). In this paper, estimators of /3 and g are constructed when 
the explanatory variables t take values on a Riemannian manifold. Our proposal combine the flexibility of these 
models with the complex structure of a set of explanatory variables. We prove that the resulting estimator of (3 is 
asymptotically normal under the suitable conditions. Through a simulation study, we explored the performance 
of the estimators. Finally, we applied the studied model to an example based on real dataset. 
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Introduction 

T ^The partially linear models was introduced by [6] to analyzed the relationship between the electricity usage and 

^-average daily temperature. In recent years, this model has gained a lot of attention in order to explore the nature 

f--of complex nonlinear phenomena. This model has been widely studied in the literature see for example [16], [5], 

among others. The partially linear models allow modeling the response variable with a set of predictors that 

enter linearly in the model while one of them is considered in the model nonparametrically. 
CO 

<^ ' However, in many applications, the predictors variables take values on a Riemannian manifold more than on 
t ^ilnclidean space and this structure of the variables needs to be taken into account in the estimation procedure. 

ome examples could be found in meteorology, astronomy, geology and other fields, that include distributions on 
.Spheres, tangent bundles, Lie groups, etc. Research on the statistical analysis of variables with some one of this 
Structures was studied by [4], [12] and more recently by [8], [14], [13] and [9]. 

d ' The aim of this work is to study the partially linear models when the explanatory variable t takes values on a 
Riemannian manifold, i.e. when the variable to be modeled in a nonparametric way is in a manifold. Our proposal 
combine the flexibility for these models with the complex structure of a set of explanatory variables. 

This paper is organized as follows. In Section [21 we construct estimates for this models and give a brief 
summary of the nonparametric estimation on Riemannian manifolds proposed in [13]. In Section [3J we present 
the asymptotic distribution of the regression parameter under regular assumptions on the bandwidth sequence. 
In Section HI we explored the performance of the estimators with a simulation study and we show an example 
using real data. Also, we review a cross validation procedure for partial linear models. Proofs are given in the 
Appendix. 



2 Estimators 

2.1 Model and estimators 

Let (yj,x^,tj) be an i.i.d. random vectors valued in IR P+1 x M with identically distribution to (y,x T ,t), where 
(M, g) is a Riemannian manifolds of dimension d. The partially linear model assume that the relation between 
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the response variable and the covariates (pcf can be represented as 

y t = xjp + g(U) + si 1 < i < n , (1) 

where the errors E{ are independent and independent of (x^,^) T , also -E(ej|x£, tj) = 0. In many situations, it 
seems reasonable to suppose that a relationship between the covariates x and t exists, so as in [TB] and [TJ, we will 
assume that for 1 < j < p 

= <j>j(ti) +r)ij 1 < i < n (2) 

where the errors rjij are independent. Denote 4>q{t) = E{y\t = r) and <f>{t) = (4>i(t), . . . , <fi p (t)), then we have that 
g(t) = 4'oi't) ~ 0(O T /3 an( i hence, y — 4>o(t) = (x — cf>(t)) T (3 + e. This equation suggest estimate the unknown 
functions and parameters as follows. Let (j)j(t) be the nonparametric estimators of (j>j for < j < p. Note that the 
regression functions correspond to predictors taking values in a Riemannian manifold, nonparametric kernel type 
estimators adapted to this structure was considered in [13] and also studied in [10] . An overview of this estimators 
can be found in the following Subsection. 

Returned to the estimation of the parameter (3, note that using the nonparametric estimators of the functions 
cj)j, the regression parameter can be estimate considering the least square estimators obtained minimizing 

n 

3 = argmin^TK^ - MU)) - (x* - 4>(U)) T /3} 2 . 
P i=i 

where <f)(t) = {4>i(t), . . . , 4> p {t)). Then the function g can be estimated as g(t) = 4>o{t) — 0(t) T /3. This procedure 
is consistent with the respective estimators when the explicative variable t take values on Euclidean spaces, i.e. 
the proposed estimators reduce to know estimators introduced by [B]. 



2.2 Review of Nonparametric estimators on Riemannian manifolds 
2.2.1 Preliminaries 



As in [9] we consider (M, g) a d— dimensional oriented Riemannian manifold without boundary, complete and with 
positive injectivity radius (inj g M > ). From now on, d g will denote the distance function induced by the metric 
g. Throughout this note, we will consider the concept of volume density function. For a rigorous definition of this 
function see [3] or |10j . If we consider the exponential normal chart (U,ip) of (M,g) induced by an orthonormal 

basis {vi , . . . , Vd} of T S M, then 8 s (t) = det g t (d/dipi ,d/dip 



ai(u) = expj 1 ^) + uvi for t £ U. For example, when M is lR a 
s,t G IR d and also in the case of the cylinder 6 s (t) = 
in this case, s (t) = \sen(d g (s,t))\/d g (s,t) for t / s, 
geometric definitions. 



, where d/dtpi\ t = D a .^exp s (di(0)) with 

with the canonical metric, then 8 s (t) = 1 for all 
1. In [9], we calculate the volume density on the sphere, 
-s. and 8 s (±s) = 1. See also, [9] for a discussion on the 



2.2.2 The nonparametric estimators 

Let (y±, ti), • • • , (y n , t n ) be i.i.d random objects that take values on IR x M. In order to estimate r(r) = E(y\t = r), 
Pelletier [13] proposed a nonparametric kernel type estimators. The Pelletiers idea was to build an analogue of a 
kernel on (M,g), by using a positive function of d g distance normalized by the volume density function of (M,g), 
to take into account the curvature of the manifolds. More precisely, the nonparametric estimator can be defined 

as, 

n 

r n(t) = w n,h(t, ti)yi (3) 
i=l 

with w n>h (t,ti) = 8l' 1 (ti)K{dg(t,ti)/h)/[£^ =l 8^ 1 (t k )K(d g (t,t k )/h)}- 1 where K : IR ->■ IR is a non-negative 
function, 8t(s) the volume density function on (M,g) and the bandwidth h is a sequence of real positive numbers 
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such that lim n _>. 00 h = and h < inj g M, for all n. This last requirement on the bandwidth guarantees that ([3]) 
is defined for all t S M. In [13], is derived an expression for the asymptotic pointwise bias and variance as well 
as an expression for the asymptotic integrated mean square error. On the other hand, in [U] is proposed a robust 
version that generalized these estimators and it is obtained the uniform almost sure consistency over compact set 
and derived the asymptotic distribution. 

3 Asymptotic behavior 

The theorem of this section studies the asymptotic behavior of the regression parameter estimator of the model 
under the following conditions. 

HI. Let Mo be a compact set on M such that: / is a bounded function such that inft e j^ f(t) = A > and 
inf MeMo Ot(s) = B > 0. 

H2. The sequence h is such that n/i 4 — > and nh^/logn — > oo as n — > oo. 

H3. K : IR — > 1R is a bounded nonnegative Lipschitz function of order one, with compact support [0, 1] satisfying: 
J Rd K(\\u\\)du = 1, J Rd uK(\\u\\)du = and < J Rd ||u|| 2 K(||u||)du < oo. 

HA. For any open set Uq of Md such that Mo C Uq, the functions g, <pj for 1 < j < p are of class C 2 on Uq. 

H5. The errors £{ and rjij for 1 < i < n and 1 < j < p are independent and -E?|£i| r + Yjj=i ^l^ijl 1 " < 00 f° r r — 3, 
a 2 = var(ei) > and S = E^Jrji) is a positive defined matrix. 

Remark [31 1. The fact that 8t(t) = 1 for all p £ M guarantees that the bonded of in .£/! holds. The assumptions 
H2 and i/3 are standard assumptions when dealing kernel estimators. 

Theorem [Si. Under HI to H5 we have that V^(3 - P) N(0, cr 2 ^' 1 ). 

Remark [3l2. Note that this theorem is consistent with the respective results in the Euclidean case. The obtained 
asymptotic distribution can be used to construct a Wald-type statistics to make inference on the regression 
parameter, that is, when we want to test i^o : (3 = Pq- 

4 Real example and Monte Carlo study 

4.1 Selection of the smoothing parameter 

An important issue in any smoothing procedure is the choice of the smoothing parameter. Under a nonparametric 
regression model with carriers in an Euclidean space, i.e., when M is IR d with the canonical metric, two commonly 
used approaches are L 2 cross-validation and plug-in methods. In this section, we included a cross-validation 
method for the choice of the bandwidth in the case of partially linear models. The asymptotic properties of 
data-driven estimators require further careful investigation and are beyond the scope of this paper. 

The cross-validation method constructs an asymptotically optimal data-driven bandwidth, and thus adaptive 
data-driven estimators, by minimizing CV{h) = Y^i=i[{Vi ~ 4>o,-i,h(^i)) ~ ( x * — ( / > -i.fr.(^)) T /3] 2 > where 0o,-i,h(*) and 
<t>-i t h(t) = (0i,-i,/i(*)> • • • ; 4>p-i,h{t)) denote the nonparametric estimators computed with bandwidth h using all 
the data expect the i— th observation and /3 minimize 2~2?=i[{yi ~ <Po,-i,h{ti)) — ( x i — 4>-i ^(ii)) T /3] 2 in (3. 

4.2 Simulation study 

To evaluate the performance of the estimation procedure, we conduct a simulation study. We consider two models 
in two different Riemannian manifolds, the sphere and the cylinder endowed with the metric induced by the 
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canonical metric of IR S . We performed 1000 replications of independent samples of size n = 200 according to the 
following models: 

Sphere case: The variables (yi,Xi,ti) for 1 < i < n were generated as 

Hi = fi Xi + exp {-(tji + 2t i2 + t i3 ) 2 } + Si and x« = t a + t i2 + ^3 + Vi 

where ti = (cos(#j) cos (71), sin(0j) cos (71), sin(7j)) with 6^ and ji follow a von Mises distribution with means and 
7T and concentration parameters 3 and 5, respectively. 

Cylinder case: The variables (yi,Xi,ti) for 1 < i < n were generated as 

Ui = f3 Xi + sf + sin(0») + e» and Xi = exp(^) + rji 

where ti = (cos(#j), sin(0j), Sj) with the variables 9i follow a von Mises distribution with mean tt and concentration 
parameter 3 and the variables Sj are uniform in (—2,2), i.e. ti have support in the cylinder with radius 1 and 
height between (—2,2). 

In all cases, the regression parameter (3 was taken equal 5 and the errors Ei and rji are i.i.d. normal with 
mean and standard deviation 1. In the smoothing procedure, the kernel was taken as the quadratic kernel 
K(t) = (15/16) (1 — t 2 ) 2 I(\x\ < 1) and we choose the bandwidth using a cross validation procedure described in 
Section 14.11 The distance d g for these manifolds can be found in |10] and [9] and the volume density function in 
Section [2. 2. li Table [4T2l 1 give the mean, standard deviations, mean square error for the regression estimates of j3 
and the mean of the mean square error of the regression function g over the 1000 replications. 



mean(/3) sd(/3) MSE(/3) MSE(g) 
sphere case 5.0243 0.0762 0.0064 0.081 
cylinder case 4.9845 0.0078 0.0003 0.1001 

Table l4~2l l: Performance of f3 and 'g for both models. 

In Table 14.21 1 we can see a good behavior of the estimators in the two considered schemes. In all cases, 
the mean of the mean square error of the parametric and nonparametric estimators are small and reflect a good 
performance of the proposed estimators. 

4.3 Application to real data 

In this Subsection, we applied a partially linear model to an enviroment dataset in order to study the atmospheric 
SO2 pollution incidents. The variables included in the study are the direction and the speed of the wind, the 
temperature and the SO2 concentration in the meteorologic station at Villalba (Lugo in Galicia, Spain). The data 
was recorded daily in each minute during the year 2009. The complete dataset has a structure of dependence in 
the time. Therefore to avoid this dependence we was considered a 2000-row historical matrix that was constructed 
as in [15]. In a previous work [15] applied a partial linear models to the prediction of atmospheric SO2 pollution 
incidents in the vicinity of the coal/oil-fired power station at As Pontes (A Coruha in Galicia, Spain). But in this 
case they did not consider the direction of the wind as a directional variable. The variables that we considered in 
the model was 

y~i SO2 emission is measured in jigjm 6 
xu SO2 emission in the instant i — 30 

X2i SO2 emission diference between the instant i — 35 and i — 30 

X3i the temperature in °C 

tu wind direction in radians from the north 

t2i wind speed in m/s 

Table l4~3l l: Enviromente variables considered in the model. 
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Note that the variables ti = (tu,t2i) have support in the cylinder. The maximum of the wind speed in this 
cases is 7.7 then we consider that the variable t belongs in the cylinder of high between and 10. Therefore, we 
modeled the response variable using the following model j/j = P\Xu + /32X2j + ^x^ + gfa) + £j. 

In the smoothing procedure, we considered the quadratic kernel and we choose the bandwidth using a cross 
validation procedure. Because of the computational burden of the cross-validation method, and because there 
is really no need to use this method with a sample as large as 2000, we also determined h by the split sample 
method, i.e. by dividing the historical matrix into a 1000-member training set with odd index and a 1000-member 
validation set with even index, and taking for h the value minimizing 

[n/2] 

SV(h) = £ [(y 2i - fo,E,h(t 2l )) ~ (x M - 4> E , h (t 2i )) T p] 2 . 
i=i 

where <fi E h (t) = (<j)\,E,h{t)-, ■ ■ ■ > 4>p,E,h(t)) and 0o,E,/i(*) denote the nonparametric estimators computed with band- 
width h using the data with even index and (3 minimize J2i=i [{V2i — <fio,E,h(t2i)) — ( x 2t — 4>e h(*2i)) T /3] 2 in (3. 
In this case the selected bandwidth was h sv = 2.5. Table 14.31 2 reports the estimates values of the regression 
parameters and the mean and standard deviation of nonparametric estimator g of g. Figure [431 1. a) shows the 
estimate of the regression function over a grid of 1200 points in the cylinder. To evaluate the performance of 
the partial linear model, we consider a nonparametric model to explain yi based only in the variables xu and x 2 i 
trough an unknown function rj . In this case we estimate with the Naradaya- Watson estimator with quadratic 
kernel. We compare the prediction error for both models computing, in the case of the full nonparametric model, 
we compute EP{h) = Y^li\{y2i — vi x i,2i, ^2,2i))] 2 for a grid of 100 equispaces bandwidth between 0.1 and 10. 
For the partial linear model we compute the SV(h) for the same grid of bandwidth. As we can see in Figure HT3l 1 . 
b), the partial linear model has a better level predictive and is more stable trough the bandwidth than the full 
nonparametric model. 

~Ti J 2 K Mean(£) SB(g) 
0.9728 1.090 -0.0013 0.1141 0.0145 

Table l4~3l 2: Estimates of regression parameter. 




0.0 0.5 1.0 1.5 2.0 2.5 



bandwidth 

Figure l4~3l l: a) Estimates of the regression functions, b) Comparative of the errors: the dotted line corresponds to the full nonparametric 
model and the dashed line to the partially linear model. The vertical lines corresponds to the optimal bandwidth in each cases. 
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A Appendix 

Lemma\Ml: Let (f>j(t) = <f>j(t) - J2i=i w n ,h(t, k)xij for 1 < j < p and 4> (t) = (p (t) - Yh=\ ™n,h{t, k)Vi- Under HI 



to H\ we have that \^f(ti)\ = 0(h ) + O yy\ogn/nh d j a.s. where 7 € {<j>j\ < j < p}. 

Proof of Lemma IA1 1 : Let 7 S {(fij; < j < p} and denote by 7 the corresponding nonparametric estimator, the 

j(t) = 7(i) —j(t). Using analogous arguments that those considered in [10] we have that, sup |2£(7(i))| = 0(h 2 ). 

teM 

Let s n = n 2 h 2d sup vax(j(t)f n (t)) with f n (t) = {nh d )~ l Y2=i ^"H^-^KO, t k )/h), by results obtained in [9] 

teM 

we have that s n = 0(nh d ). By HI, inf( g j\/ -^E (j^^K(d g (t,ti)/h)\ > A > 0. Then, it follows in analogous way 
that the proof of Lemma 3.1 in [?]■□ 

Lemma IA"1 2: Under HI to H4: we have that n _1 x T x — > S. 
Proof of LemmalA\2: The element I, s of n x'x can be written as 

(ran n n \ 

Y VilVis + Y 4>l{ t i) r hs + Y ^s(ti)Vil + Y <M*i)<M*i) ) 
i=l i=l i=l i=l / 

where 4>j(t) = <fij(t) — 4>j(t). We need to show that all terms except the first term converge to zero and by 
applying the strong law of large numbers we get that n~ l Y^i=i VilVis ~~ ^ S/ s . Since Lemma[Ajl and the fact that 
n ~ X Ya=i Vu an d using the Cauchy-Schwarz inequality we get the result .□ 

LemmalA"l3: Under HI to H3, we have that maxi<i,-< n \w n uipii = 0((nh d )~ 1 ). 
Proof of Lemma [Al 3: Using the results obtained in |10j and [9] we have that 



sup 

teM 



1 n 1 1/1 \ 



o(l) a.s. (4) 



!£Lt? e {ik) K[iS ' tl)/,!) ) - A>0 - 



teM 

Then by @) and ([5]) and the boundedness of K and 6t, the lemma holds.D 

Remark [Al4: Note that by Lemmas [All and [Aj3 and using Lemma A.l in [11]; we have that max \"f(ti 

Ki<n 



(5) 



k=l 



Y w ^h( t h t k) r r( t k)\ = 0{h 2 ) + (^logn/nh d ) a.s. for any 7 G {^j; < j < p}. 



Proof® 1: We can write y/n0 - (3) = (n" 1 x T x)" 1 ^ 1 / 2 [A ln - A 2n + A 3n ] where 

n n / n \ n 

Al n = Y ^i9*(U) A 2n = Y *M Y W n,h( t i'> t j) £ j ) ^3ra = Y * i£i 
i=l i=l \i=l ) i=l 

and g*(t) = g(t) — J27=i w n,h(t,ti)g(ti). Using Lemmas lAll to[Aj3, the asymptotic behavior of A\ n ,Ai n and A% n 
can be obtained in the same way that in [2]. Specifically, considering the assumptions imposed on h, we can 



obtained that 

A nl = 0{nh* + h~ d log 2 n) + 0(n 1/2 h 2 log n + h~ d/2 log 2 n) + 0(n 1/2 /i 2 /i~ d/2 log n) + 0(/i- d log 2 n) = o(n 1/2 ) 
,4 n2 = 0(n 1/2 /i 2 /i~ d/2 log n + log 2 n) + 0(/i~ d/2 log 2 re) + 0{h~ d log 2 n) = o(n 1/2 ) 

n n 

^n3 = 0(n l ' 2 h 2 log n) + 0(/i~ rf / 2 log 2 re) + £ + 0(/i- d / 2 log 2 n) = £ + o(n 1 / 2 ) 

j=i ?=i 

Finally, the central limit theorem gives the desired result. □ 
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